45 research outputs found
Building automated vandalism detection tools for Wikidata
Wikidata, like Wikipedia, is a knowledge base that anyone can edit. This open
collaboration model is powerful in that it reduces barriers to participation
and allows a large number of people to contribute. However, it exposes the
knowledge base to the risk of vandalism and low-quality contributions. In this
work, we build on past work detecting vandalism in Wikipedia to detect
vandalism in Wikidata. This work is novel in that identifying damaging changes
in a structured knowledge-base requires substantially different feature
engineering work than in a text-based wiki like Wikipedia. We also discuss the
utility of these classifiers for reducing the overall workload of vandalism
patrollers in Wikidata. We describe a machine classification strategy that is
able to catch 89% of vandalism while reducing patrollers' workload by 98%, by
drawing lightly from contextual features of an edit and heavily from the
characteristics of the user making the edit
PreCall: A Visual Interface for Threshold Optimization in ML Model Selection
Machine learning systems are ubiquitous in various kinds of digital
applications and have a huge impact on our everyday life. But a lack of
explainability and interpretability of such systems hinders meaningful
participation by people, especially by those without a technical background.
Interactive visual interfaces (e.g., providing means for manipulating
parameters in the user interface) can help tackle this challenge. In this paper
we present PreCall, an interactive visual interface for ORES, a machine
learning-based web service for Wikimedia projects such as Wikipedia. While ORES
can be used for a number of settings, it can be challenging to translate
requirements from the application domain into formal parameter sets needed to
configure the ORES models. Assisting Wikipedia editors in finding damaging
edits, for example, can be realized at various stages of automatization, which
might impact the precision of the applied model. Our prototype PreCall attempts
to close this translation gap by interactively visualizing the relationship
between major model metrics (recall, precision, false positive rate) and a
parameter (the threshold between valuable and damaging edits). Furthermore,
PreCall visualizes the probable results for the current model configuration to
improve the human's understanding of the relationship between metrics and
outcome when using ORES. We describe PreCall's components and present a use
case that highlights the benefits of our approach. Finally, we pose further
research questions we would like to discuss during the workshop.Comment: HCML Perspectives Workshop at CHI 2019, May 04, 2019, Glasgo
On Improving Summarization Factual Consistency from Natural Language Feedback
Despite the recent progress in language generation models, their outputs may
not always meet user expectations. In this work, we study whether informational
feedback in natural language can be leveraged to improve generation quality and
user preference alignment. To this end, we consider factual consistency in
summarization, the quality that the summary should only contain information
supported by the input documents, as the user-expected preference. We collect a
high-quality dataset, DeFacto, containing human demonstrations and
informational natural language feedback consisting of corrective instructions,
edited summaries, and explanations with respect to the factual consistency of
the summary. Using our dataset, we study three natural language generation
tasks: (1) editing a summary by following the human feedback, (2) generating
human feedback for editing the original summary, and (3) revising the initial
summary to correct factual errors by generating both the human feedback and
edited summary. We show that DeFacto can provide factually consistent
human-edited summaries and further insights into summarization factual
consistency thanks to its informational natural language feedback. We further
demonstrate that fine-tuned language models can leverage our dataset to improve
the summary factual consistency, while large language models lack the zero-shot
learning ability in our proposed tasks that require controllable text
generation.Comment: ACL 2023 Camera Ready, GitHub Repo:
https://github.com/microsoft/DeFact
NICE: Social translucence through UI intervention
ABSTRACT Social production systems such as Wikipedia rely on attracting and motivating volunteer contributions to be successful. One strong demotivating factor can be when an editor's work is discarded, or "reverted", by others. In this paper we demonstrate evidence of this effect and design a novel interface aimed at improving communication between the reverting and reverted editors. We deployed the interface in a controlled experiment on the live Wikipedia site, and report on changes in the behavior of 487 contributors who were reverted by editors using our interface. Our results suggest that simple interface modifications (such as informing Wikipedians that the editor they are reverting is a newcomer) can have substantial positive effects in protecting against contribution loss in newcomers and improving the quality of work done by more experienced contributors
OPEN COMMUNITY HEALTH: WORKSHOP REPORT
This report summarizes key outcomes from a workshop on open community health conducted at the University of Nebraska at Omaha in April 2018. Workshop members represented research and practice communities across Citizen Science, Open Source, and Wikipedia. The outcomes from the workshop include (1) comparisons among these communities, (2) how a shared understanding and assessment of open community health can be developed, and (3) a taxonomical comparison to begin a conversation between these communities that have developed disparate languages
Not at Home on the Range: Peer Production and the Urban/Rural Divide
ABSTRACT Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. In this paper, we explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in both Wikipedia and OpenStreetMap, peer-produced content about rural areas is of systematically lower quality, is less likely to have been produced by contributors who focus on the local area, and is more likely to have been generated by automated software agents (i.e. "bots"). We then codify the systemic challenges inherent to characterizing rural phenomena through peer production and discuss potential solutions
Not at Home on the Range: Peer Production and the Urban/Rural Divide
ABSTRACT Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. In this paper, we explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in both Wikipedia and OpenStreetMap, peer-produced content about rural areas is of systematically lower quality, is less likely to have been produced by contributors who focus on the local area, and is more likely to have been generated by automated software agents (i.e. "bots"). We then codify the systemic challenges inherent to characterizing rural phenomena through peer production and discuss potential solutions
Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?
Many machine learning projects for new application areas involve teams of
humans who label data for a particular purpose, from hiring crowdworkers to the
paper's authors labeling the data themselves. Such a task is quite similar to
(or a form of) structured content analysis, which is a longstanding methodology
in the social sciences and humanities, with many established best practices. In
this paper, we investigate to what extent a sample of machine learning
application papers in social computing --- specifically papers from ArXiv and
traditional publications performing an ML classification task on Twitter data
--- give specific details about whether such best practices were followed. Our
team conducted multiple rounds of structured content analysis of each paper,
making determinations such as: Does the paper report who the labelers were,
what their qualifications were, whether they independently labeled the same
items, whether inter-rater reliability metrics were disclosed, what level of
training and/or instructions were given to labelers, whether compensation for
crowdworkers is disclosed, and if the training data is publicly available. We
find a wide divergence in whether such practices were followed and documented.
Much of machine learning research and education focuses on what is done once a
"gold standard" of training data is available, but we discuss issues around the
equally-important aspect of whether such data is reliable in the first place.Comment: 18 pages, includes appendi